A Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation

نویسنده

  • Kunihiko Sadakane
چکیده

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. We compare algorithms for making su x arrays of Bentley-Sedgewick, Andersson-Nilsson and Karp-Miller-Rosenberg and making su x trees of Larsson on speed and required memory and propose a new algorithm which is fast and memory e cient by combining them. We also de ne a measure of di culty of sorting su xes: average match length. Our algorithm is e ective when the average match length of a text is large, especially for large databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast Algorithm for Making Su x Arrays and for Burrows-Wheeler Transformation

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...

متن کامل

On the combinatorics of suffix arrays

We prove several combinatorial properties of suffix arrays, including a characterization of suffix arrays through a bijection with a certain well-defined class of permutations. Our approach is based on the characterization of Burrows-Wheeler arrays given in [1], that we apply by reducing suffix sorting to cyclic shift sorting through the use of an additional sentinel symbol. We show that the ch...

متن کامل

Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT)

MOTIVATION Over the last few years, methods based on suffix arrays using the Burrows-Wheeler Transform have been widely used for DNA sequence read matching and assembly. These provide very fast search algorithms, linear in the search pattern size, on a highly compressible representation of the dataset being searched. Meanwhile, algorithmic development for genotype data has concentrated on stati...

متن کامل

Entropy-Compressed Indexes for Multidimensional Pattern Matching

In this talk, we will discuss the challenges involved in developing a multidimensional generalizations of compressed text indexing structures. These structures depend on some notion of Burrows-Wheeler transform (BWT) for multiple dimensions, though naive generalizations do not enable multidimensional pattern matching. We study the 2D case to possibly highlight combinatorial properties that do n...

متن کامل

A Modified Burrows-Wheeler Transformation for Case-Insensitive Search with Application to Suffix Array Compression

Now the Block sorting compression [l] becomes common by its good balance of compression ratio and speed. It has another nice feature, which is the relation between encoding/decoding process and suffix array. The suffix array [2] is a memory-efficient data structure for searching any substring of a text. It is an array of lexicographically sorted pointers to suffixes of a text. It is also used f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998